Introdução


Vamos comparar as séries Game of Thrones e Xena a Princesa Guerreira para responder :

  • Qual das duas é mais bem avaliada pelos usuários?
    • A diferença se mantém ao longo das temporadas?
    • Existe efeito por ser começo/fim ou metade da temporada?

Conjunto de dados utilizado

Análise exploratória de dados do IMDB sobre as séries Game of Thrones e Xena a Princesa Guerreira. Os dados originais e as variáveis vêm deste repositorio . Lá consta a explicação de como os dados foram gerados e do significado de cada variável.

episodes <- read_csv(here("data/series_from_imdb.csv"), 
                    progress = FALSE,
                    col_types = cols(.default = col_double(), 
                                     series_name = col_character(), 
                                     episode = col_character(), 
                                     url = col_character(),
                                     season = col_character())) %>% 
    filter(series_name %in% c("Game of Thrones","Xena a Princesa Guerreira")) 
episodes %>% 
    glimpse()
Observations: 194
Variables: 18
$ series_name <chr> "Xena a Princesa Guerreira", "Xena a Princesa Guerreira", "Xena a Pri...
$ episode     <chr> "Sins of the Past", "Chariots of War", "Dreamworker", "Cradle of Hope...
$ series_ep   <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20...
$ season      <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",...
$ season_ep   <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20...
$ url         <chr> "http://www.imdb.com/title/tt0394990/", "http://www.imdb.com/title/tt...
$ user_rating <dbl> 7.9, 7.4, 7.7, 7.4, 7.5, 7.7, 7.5, 8.0, 7.8, 7.6, 7.1, 7.2, 6.9, 8.2,...
$ user_votes  <dbl> 440, 339, 318, 297, 288, 282, 270, 303, 278, 287, 271, 269, 271, 266,...
$ r1          <dbl> 0.003623188, 0.025641026, 0.064516129, 0.023474178, 0.003759398, 0.04...
$ r2          <dbl> 0.04347826, 0.03846154, 0.03548387, 0.02347418, 0.02255639, 0.0405405...
$ r3          <dbl> 0.010869565, 0.038461538, 0.029032258, 0.004694836, 0.000000000, 0.00...
$ r4          <dbl> 0.007246377, 0.034188034, 0.022580645, 0.023474178, 0.003759398, 0.01...
$ r5          <dbl> 0.018115942, 0.042735043, 0.019354839, 0.046948357, 0.045112782, 0.02...
$ r6          <dbl> 0.02536232, 0.12393162, 0.03870968, 0.06103286, 0.12781955, 0.0495495...
$ r7          <dbl> 0.08695652, 0.16239316, 0.05161290, 0.11267606, 0.13909774, 0.0765765...
$ r8          <dbl> 0.1086957, 0.1666667, 0.1354839, 0.2112676, 0.1804511, 0.1171171, 0.0...
$ r9          <dbl> 0.15579710, 0.09829060, 0.17096774, 0.18309859, 0.13909774, 0.2207207...
$ r10         <dbl> 0.5398551, 0.2692308, 0.4322581, 0.3098592, 0.3383459, 0.4054054, 0.4...

Episódios da metade da temporada

A título de tornar nossa discussão mais interessante vamos gerar uma nova informação : “Um episódio faz parte da metade da temporada?” (middle_eps). Um episódio é da metade da temporada se está entre os 60% dos episódios centrais de uma temporada.

sumario_simples <- 
    episodes %>% 
    select(season_ep,season,series_name) %>%
    group_by(series_name,season) %>% 
    summarise(n = n(),
               p20 = quantile(seq(from=1, to=n, by=1), 0.20),
               p80 = quantile(seq(from=1, to=n, by=1), 0.80))
episodes <- left_join(episodes, sumario_simples,
                      by = c("series_name", "season")) %>% 
    group_by(series_name, season) %>%
    mutate(middle_eps = (season_ep > p20) &
               (season_ep < p80)) %>% 
    ungroup()
episodes %>% 
    select(series_name, series_ep, middle_eps)

E aí, quem se saiu melhor?


Comparemos então as avaliações dadas aos episódios das duas séries ao longo de suas 6 temporadas:


m <- list(
  b = 100,
  r = 185,
  t = 75
  )
p <- episodes %>% 
      ggplot(aes(x = series_name, y = user_rating, 
                 color=middle_eps,
                 group=episode, text = paste(
                    "Série:", series_name,
                    "\nEpisódio:", episode,
                    "\nAvaliação:", user_rating
                     ))) + 
        geom_jitter(width = 0.3, alpha=0.7) +
        facet_wrap(~ season) +
        xlab("") +
        ylab("Votação do Usuário") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1))  +
        scale_x_discrete(labels=c("GOT", "Xena")) +
        labs(color='Metade da temporada?') +
        ggtitle(paste("GOT x Xena (Temporada a Temporada)")) +
        theme_update(plot.title = element_text(hjust = -1))
ggplotly(p, tooltip = "text") %>%
  layout(autosize = F, margin = m)

É possível perceber que ao longo de todas as seis temporadas (na 5ª e 6ª temporada sendo menos unânime), quer sejam episódios do \(\color{red}{\text{começo/metade da temporada}}\), quer sejam \(\color{blue}{\text{do meio da temporada}}\), Game of Thrones (GOT) tem avaliações mais altas que Xena a Princeisa Guerreira (Xena).

LS0tCnRpdGxlOiAiRURBIHNvYnJlIEdPVCB4IFhlbmEgKElNREIpIgpzdWJ0aXRsZTogJycKYXV0aG9yOiAiSm9zw6kgQmVuYXJkaSBkZSBTb3V6YSBOdW5lcyIKb3V0cHV0OgogIGh0bWxfZG9jdW1lbnQ6CiAgICBkZl9wcmludDogcGFnZWQKICAgIHRvYzogeWVzCiAgICB0b2NfZmxvYXQ6IHllcwogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB5ZXMKLS0tCgo8YnI+PC9icj4KCiMjIEludHJvZHXDp8OjbwoKKioqCgpWYW1vcyBjb21wYXJhciBhcyBzw6lyaWVzICpHYW1lIG9mIFRocm9uZXMqIGUgKlhlbmEgYSBQcmluY2VzYSBHdWVycmVpcmEqIHBhcmEgcmVzcG9uZGVyIDoKCiAqIFF1YWwgZGFzIGR1YXMgw6kgbWFpcyBiZW0gYXZhbGlhZGEgcGVsb3MgdXN1w6FyaW9zPwogICAgKyBBIGRpZmVyZW7Dp2Egc2UgbWFudMOpbSBhbyBsb25nbyBkYXMgdGVtcG9yYWRhcz8KICAgICsgRXhpc3RlIGVmZWl0byBwb3Igc2VyIGNvbWXDp28vZmltIG91IG1ldGFkZSBkYSB0ZW1wb3JhZGE/IAoKKioqCgojIyMgQ29uanVudG8gZGUgZGFkb3MgdXRpbGl6YWRvCgo+IEFuw6FsaXNlIGV4cGxvcmF0w7NyaWEgZGUgZGFkb3MgZG8gW0lNREJdKGh0dHBzOi8vd3d3LmltZGIuY29tLykgIHNvYnJlIGFzIHPDqXJpZXMgR2FtZSBvZiBUaHJvbmVzIGUgWGVuYSBhIFByaW5jZXNhIEd1ZXJyZWlyYS4gT3MgZGFkb3Mgb3JpZ2luYWlzIGUgYXMgdmFyacOhdmVpcyB2w6ptIFtkZXN0ZSByZXBvc2l0b3Jpb10oaHR0cHM6Ly9naXRodWIuY29tL25hemFyZW5vL2ltZGItc2VyaWVzKSAuIEzDoSBjb25zdGEgYSBleHBsaWNhw6fDo28gZGUgY29tbyBvcyBkYWRvcyBmb3JhbSBnZXJhZG9zIGUgZG8gc2lnbmlmaWNhZG8gZGUgY2FkYSB2YXJpw6F2ZWwuCgpgYGB7ciBzZXR1cCwgZWNobz1GQUxTRSwgd2FybmluZz1GQUxTRSwgbWVzc2FnZT1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkoaGVyZSkKbGlicmFyeShwbG90bHkpCnRoZW1lX3NldCh0aGVtZV9idygpKQpgYGAKCmBgYHtyfQplcGlzb2RlcyA8LSByZWFkX2NzdihoZXJlKCJkYXRhL3Nlcmllc19mcm9tX2ltZGIuY3N2IiksIAogICAgICAgICAgICAgICAgICAgIHByb2dyZXNzID0gRkFMU0UsCiAgICAgICAgICAgICAgICAgICAgY29sX3R5cGVzID0gY29scyguZGVmYXVsdCA9IGNvbF9kb3VibGUoKSwgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBzZXJpZXNfbmFtZSA9IGNvbF9jaGFyYWN0ZXIoKSwgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBlcGlzb2RlID0gY29sX2NoYXJhY3RlcigpLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHVybCA9IGNvbF9jaGFyYWN0ZXIoKSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHNlYXNvbiA9IGNvbF9jaGFyYWN0ZXIoKSkpICU+JSAKICAgIGZpbHRlcihzZXJpZXNfbmFtZSAlaW4lIGMoIkdhbWUgb2YgVGhyb25lcyIsIlhlbmEgYSBQcmluY2VzYSBHdWVycmVpcmEiKSkgCgplcGlzb2RlcyAlPiUgCiAgICBnbGltcHNlKCkKYGBgCgoqKioqCgojIyMgRXBpc8OzZGlvcyBkYSBtZXRhZGUgZGEgdGVtcG9yYWRhCgo+QSB0w610dWxvIGRlIHRvcm5hciBub3NzYSBkaXNjdXNzw6NvIG1haXMgaW50ZXJlc3NhbnRlIHZhbW9zIGdlcmFyIHVtYSBub3ZhIGluZm9ybWHDp8OjbyA6ICJVbSBlcGlzw7NkaW8gZmF6IHBhcnRlIGRhIG1ldGFkZSBkYSB0ZW1wb3JhZGE/IiAoKm1pZGRsZV9lcHMqKS4gVW0gZXBpc8OzZGlvIMOpIGRhIG1ldGFkZSBkYSB0ZW1wb3JhZGEgc2UgZXN0w6EgZW50cmUgb3MgNjAlIGRvcyBlcGlzw7NkaW9zIGNlbnRyYWlzIGRlIHVtYSB0ZW1wb3JhZGEuICAKCmBgYHtyfQpzdW1hcmlvX3NpbXBsZXMgPC0gCiAgICBlcGlzb2RlcyAlPiUgCiAgICBzZWxlY3Qoc2Vhc29uX2VwLHNlYXNvbixzZXJpZXNfbmFtZSkgJT4lCiAgICBncm91cF9ieShzZXJpZXNfbmFtZSxzZWFzb24pICU+JSAKICAgIHN1bW1hcmlzZShuID0gbigpLAogICAgICAgICAgICAgICBwMjAgPSBxdWFudGlsZShzZXEoZnJvbT0xLCB0bz1uLCBieT0xKSwgMC4yMCksCiAgICAgICAgICAgICAgIHA4MCA9IHF1YW50aWxlKHNlcShmcm9tPTEsIHRvPW4sIGJ5PTEpLCAwLjgwKSkKCmVwaXNvZGVzIDwtIGxlZnRfam9pbihlcGlzb2Rlcywgc3VtYXJpb19zaW1wbGVzLAogICAgICAgICAgICAgICAgICAgICAgYnkgPSBjKCJzZXJpZXNfbmFtZSIsICJzZWFzb24iKSkgJT4lIAogICAgZ3JvdXBfYnkoc2VyaWVzX25hbWUsIHNlYXNvbikgJT4lCiAgICBtdXRhdGUobWlkZGxlX2VwcyA9IChzZWFzb25fZXAgPiBwMjApICYKICAgICAgICAgICAgICAgKHNlYXNvbl9lcCA8IHA4MCkpICU+JSAKICAgIHVuZ3JvdXAoKQplcGlzb2RlcyAlPiUgCiAgICBzZWxlY3Qoc2VyaWVzX25hbWUsIHNlcmllc19lcCwgbWlkZGxlX2VwcykKYGBgCgoqKioKCiMjIEUgYcOtLCBxdWVtIHNlIHNhaXUgbWVsaG9yPwoKPGJyPjwvYnI+CgpDb21wYXJlbW9zIGVudMOjbyBhcyBhdmFsaWHDp8O1ZXMgZGFkYXMgYW9zIGVwaXPDs2Rpb3MgZGFzIGR1YXMgc8OpcmllcyBhbyBsb25nbyBkZSBzdWFzICoqNiB0ZW1wb3JhZGFzOioqCgo8YnI+PC9icj4KCmBgYHtyLCB3YXJuaW5nPUZBTFNFLCBtZXNzYWdlPUZBTFNFfQptIDwtIGxpc3QoCiAgYiA9IDEwMCwKICByID0gMTg1LAogIHQgPSA3NQogICkKCnAgPC0gZXBpc29kZXMgJT4lIAogICAgICBnZ3Bsb3QoYWVzKHggPSBzZXJpZXNfbmFtZSwgeSA9IHVzZXJfcmF0aW5nLCAKICAgICAgICAgICAgICAgICBjb2xvcj1taWRkbGVfZXBzLAogICAgICAgICAgICAgICAgIGdyb3VwPWVwaXNvZGUsIHRleHQgPSBwYXN0ZSgKICAgICAgICAgICAgICAgICAgICAiU8OpcmllOiIsIHNlcmllc19uYW1lLAogICAgICAgICAgICAgICAgICAgICJcbkVwaXPDs2RpbzoiLCBlcGlzb2RlLAogICAgICAgICAgICAgICAgICAgICJcbkF2YWxpYcOnw6NvOiIsIHVzZXJfcmF0aW5nCiAgICAgICAgICAgICAgICAgICAgICkpKSArIAogICAgICAgIGdlb21faml0dGVyKHdpZHRoID0gMC4zLCBhbHBoYT0wLjcpICsKICAgICAgICBmYWNldF93cmFwKH4gc2Vhc29uKSArCiAgICAgICAgeGxhYigiIikgKwogICAgICAgIHlsYWIoIlZvdGHDp8OjbyBkbyBVc3XDoXJpbyIpICsKICAgICAgICB0aGVtZShheGlzLnRleHQueCA9IGVsZW1lbnRfdGV4dChhbmdsZSA9IDkwLCBoanVzdCA9IDEpKSAgKwogICAgICAgIHNjYWxlX3hfZGlzY3JldGUobGFiZWxzPWMoIkdPVCIsICJYZW5hIikpICsKICAgICAgICBsYWJzKGNvbG9yPSdNZXRhZGUgZGEgdGVtcG9yYWRhPycpICsKICAgICAgICBnZ3RpdGxlKHBhc3RlKCJHT1QgeCBYZW5hIChUZW1wb3JhZGEgYSBUZW1wb3JhZGEpIikpICsKICAgICAgICB0aGVtZV91cGRhdGUocGxvdC50aXRsZSA9IGVsZW1lbnRfdGV4dChoanVzdCA9IC0xKSkKCmdncGxvdGx5KHAsIHRvb2x0aXAgPSAidGV4dCIpICU+JQogIGxheW91dChhdXRvc2l6ZSA9IEYsIG1hcmdpbiA9IG0pCmBgYAoKw4kgcG9zc8OtdmVsIHBlcmNlYmVyIHF1ZSAqKmFvIGxvbmdvIGRlIHRvZGFzIGFzIHNlaXMgdGVtcG9yYWRhcyoqIChuYSA1wqogZSA2wqogdGVtcG9yYWRhIHNlbmRvIG1lbm9zIHVuw6JuaW1lKSwgcXVlciBzZWphbSBlcGlzw7NkaW9zIGRvICRcY29sb3J7cmVkfXtcdGV4dHtjb21lw6dvL21ldGFkZSBkYSB0ZW1wb3JhZGF9fSQsIHF1ZXIgc2VqYW0gJFxjb2xvcntibHVlfXtcdGV4dHtkbyBtZWlvIGRhIHRlbXBvcmFkYX19JCwgKipHYW1lIG9mIFRocm9uZXMgKEdPVCkgdGVtIGF2YWxpYcOnw7VlcyBtYWlzIGFsdGFzIHF1ZSBYZW5hIGEgUHJpbmNlaXNhIEd1ZXJyZWlyYSAoWGVuYSkqKi4gIAoK